{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--BOOK_INFORMATION-->\n",
    "<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PHydro-cover-small.png\">\n",
    "*This is the Jupyter notebook version of the [Python in Hydrology](http://www.greenteapress.com/pythonhydro/pythonhydro.html) by Sat Kumar Tomer.*\n",
    "*Source code is available at [code.google.com](https://code.google.com/archive/p/python-in-hydrology/source).*\n",
    "\n",
    "*The book is available under the [GNU Free Documentation License](http://www.gnu.org/copyleft/fdl.html). If you have comments, corrections or suggestions, please send email to satkumartomer@gmail.com.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Multivariate Distribution](10.02-Multivariate-Distribution.ipynb) | [Contents](Index.ipynb) | [Bias Correction](10.04-Bias-Correction.ipynb)>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10.3 Kriging\n",
    "\n",
    "克里金(Krigin)是水文学领域中广泛应用的一种插值方法。克里金优于其他插值方法，例如线性插值、径向函数法和样条插值法等，它提供了一种方法来估计插值参数，同时也提供了插值的不确定性。克里金方法包括两个步骤：(i)对变差图进行估计并对其进行理论变差图拟合，以及(ii)进行插值。经验变差图揭示了数据和采样间距的许多许多有趣的事情。研究这些方面是值得的。所以，首先我们开始通过不同类型的虚拟数据集来理解变差图。在本节中的例子来自P.K.Kitanidis的`Introduction to Geostatistics`一书。python库用于Kriginig是`ambhas.krige`。\n",
    "\n",
    "首先，我们考虑变异规模大于抽样规模的情况。考虑这个变量，\n",
    "\n",
    "<center>$z(x)=cos(2x/0.001)\\quad\\quad\\quad\\quad(10.1)$ </center>\n",
    "\n",
    "其中，$x$表示从范围在0到1之间的均分分布抽样100次。因为`ambhas.krige`库只适用于二维数据，我们将多生成一个除x之外的变量$y$。$y$值将是常量，只模拟一个维度的影响。生成虚拟数据的python代码是："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "module compiled against API version 0xb but this version of numpy is 0xa",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "\u001b[1;31mRuntimeError\u001b[0m: module compiled against API version 0xb but this version of numpy is 0xa"
     ]
    },
    {
     "ename": "ImportError",
     "evalue": "numpy.core.multiarray failed to import",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mImportError\u001b[0m                               Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-1-0f83d4d80973>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;31m# import required library\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpyplot\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mplt\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      4\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m \u001b[0mx\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrandom\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mrand\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;36m100\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# uniformly distributed x\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\Users\\laihetao\\AppData\\Local\\conda\\conda\\envs\\DataProcess2\\lib\\site-packages\\matplotlib\\pyplot.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m     27\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mcycler\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mcycler\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     28\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 29\u001b[1;33m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcolorbar\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     30\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mmatplotlib\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mstyle\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     31\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mmatplotlib\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0m_pylab_helpers\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0minteractive\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\Users\\laihetao\\AppData\\Local\\conda\\conda\\envs\\DataProcess2\\lib\\site-packages\\matplotlib\\colorbar.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m     30\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     31\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mmpl\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 32\u001b[1;33m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0martist\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mmartist\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     33\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcbook\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mcbook\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     34\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcollections\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mcollections\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\Users\\laihetao\\AppData\\Local\\conda\\conda\\envs\\DataProcess2\\lib\\site-packages\\matplotlib\\artist.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m     13\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcbook\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mmplDeprecation\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     14\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mmatplotlib\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mdocstring\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mrcParams\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 15\u001b[1;33m from .transforms import (Bbox, IdentityTransform, TransformedBbox,\n\u001b[0m\u001b[0;32m     16\u001b[0m                          TransformedPath, Transform)\n\u001b[0;32m     17\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[1;33m.\u001b[0m\u001b[0mpath\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mPath\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32mC:\\Users\\laihetao\\AppData\\Local\\conda\\conda\\envs\\DataProcess2\\lib\\site-packages\\matplotlib\\transforms.py\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m     37\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[1;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     38\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mma\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 39\u001b[1;33m from matplotlib._path import (affine_transform, count_bboxes_overlapping_bbox,\n\u001b[0m\u001b[0;32m     40\u001b[0m     update_path_extents)\n\u001b[0;32m     41\u001b[0m \u001b[1;32mfrom\u001b[0m \u001b[0mnumpy\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mlinalg\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0minv\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mImportError\u001b[0m: numpy.core.multiarray failed to import"
     ]
    }
   ],
   "source": [
    "# import required library\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "x = np.random.rand(100) # uniformly distributed x\n",
    "y = np.ones(x.shape)\n",
    "z = np.cos(2*np.pi*x/0.001)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们看看生成的虚拟数据。结果图如图10.3所示。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.clf()\n",
    "plt.plot(x,z,'ro')\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('z')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们从导入`ambhas.krige`库开始。我们使用其名称为`OK`的函数Oridinary Kriging来创建一个Kriging类的实例。为了查看变差函数，我们使用`variogram`方法。有两种类型的变量函数可用，第一个是原始变差图，即数据的平均不可执行，第二个是在均匀的滞后距离区间上的平均。让我们先看看原始的变差图是怎么样的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ambhas import krige\n",
    "foo = krige.OK(x,y,z)\n",
    "D,G = foo.variogram(var_type='scattered')\n",
    "\n",
    "plt.clf()\n",
    "plt.plot(D,G,'.')\n",
    "plt.xlabel('lag distance')\n",
    "plt.ylabel('variogram')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "原始变差图如图10.4所示。这个变差图非常离散，并不能提供对数据行为的任何了解。为了观察数据的行为，我们使用平均变差函数。\n",
    "\n",
    "平均变差函数的估计(或简单的称为变差函数)是通过将`averaged`作为`variogram`方法的参数`var_type`的输入来实现。变差图如图10.5所示。该变差图看上去趋于水平。这意味着行为在任何滞后距离都是相同的，或变量(z)在任何尺度上都有相同的变化。这可能是因为变量就是如此，或者我们的抽样太差，以至于我们无法捕捉变量的行为。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "DE,GE = foo.variogram(var_type='averaged',min_lag=0.025)\n",
    "\n",
    "plt.clf()\n",
    "plot.plot(DE,GE,'ro')\n",
    "plt.xlabel('lag distance')\n",
    "plt.ylabel('variogram')\n",
    "plt.ylim((0,1))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们尝试另一个变量，它在比我们的采样更大规模上有变异性。我们保留相同的采样，并以下面的方式改变信号的变异性：\n",
    "<center>$z(x)=cos(2x/0.25)\\quad\\quad\\quad\\quad\\quad\\quad(10.2)$</center>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "x = np.random.rand(100) # uniformly distributed x\n",
    "y = np.ones(x.shape)\n",
    "z = np.cos(2*x/0.25)\n",
    "\n",
    "plt.clf()\n",
    "plt.plot(x,z,'ro')\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('z')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "信号如图10.6所示。在图中，我们看到数据显示了一个很好的模式。模式的可见性来自数据确实有一些模式或抽样足够好的。正如我们在上一个例子中所看见的，原始变差图是非常杂乱的，所以我们只绘制平均变差图。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "foo = krige.OK(x,y,z)\n",
    "DE,GE = foo.variogram(var_type='averaged', min_lag=0.025)\n",
    "\n",
    "plt.clf()\n",
    "plt.plot(DE,GE, 'r--o')\n",
    "plt.xlabel('lag distance')\n",
    "plt.ylabel('variogram')\n",
    "plt.ylim((0,1))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "变差函数结果图如图10.7所示。变差图显示了清晰的趋势。变异性随着滞后距离的增加而增加。这意味着在更大的滞后距离上有更高的变异性。即该采样足够好捕捉数据的变异性。\n",
    "\n",
    "变差图在0.3的滞后距离附近稳定。这个距离大约是相关长度(在这个滞后距离滞后数据显示没有相互关系)。\n",
    "\n",
    "让我们再生成一个信号。图10.8显示了该信号。随着x的增加，变量(z)呈上升趋势，因此信号是非平稳的。这个变量变化的也很快，而且任何行为都不能从这个趋势中看出。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "x = np.random.rand(100) # uniformly distributed x\n",
    "y = np.ones(x.shape)\n",
    "z = x + 0.2*np.random.randn(len(x))\n",
    "plt.clf()\n",
    "\n",
    "plt.plot(x,z, 'ro')\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('z')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "图10.8:z随着x的变化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "foo = krige.OK(x,y,z)\n",
    "DE, GE = foo.variogram(var_type='averaged', min_lag=0.05)\n",
    "\n",
    "plt.clf()\n",
    "plt.plot(DE,GE, 'r--o')\n",
    "plt.xlabel('lag distance')\n",
    "plt.ylabel('variogram')\n",
    "plt.ylim(ymin=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们现在估计它的变差函数。变差函数如图10.9所示。该变差函数有以下两个明显的特征:\n",
    "\n",
    "- 变差函数在0滞后距离时并不接近于零。\n",
    "- 变差函数随着滞后距离恒定增长。\n",
    "- 变差函数并不稳定。\n",
    "\n",
    "以上三个要素有如下揭示:\n",
    "\n",
    "- 数据的采样比数据的变异性大。\n",
    "- 数据的范围不足以捕捉数据的变异性。\n",
    "- 数据是非固定的。\n",
    "\n",
    "我们移除z中的趋势，然后绘制变差函数图。去趋势后的数据如图10.10所示，并且变差函数如图10.11所示。现在的数据没有趋势，因为趋势被移除了。而且该数据看上去只是没有趋势的随机数。变差函数结果图也表明，数据在所有的滞后距离处具有相同的变异性，这意味着它是随机行为。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "z = z-x\n",
    "\n",
    "plt.clf()\n",
    "plt.plot(x,z, 'ro')\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('z')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "foo = krige.OK(x,y,z)\n",
    "DE, GE = foo.variogram(var_type='averaged', min_lag=0.05)\n",
    "\n",
    "plt.clf()\n",
    "plt.plot(DE,GE, 'r--o')\n",
    "plt.xlabel('lag distance')\n",
    "plt.ylabel('variogram')\n",
    "plt.ylim(ymin=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，让我们对站点数据执行克里金插值，来绘制变量地图和非确定性估计。首先我们估计经验变差函数，然后通过命中和试验，将理论变差函数(在这个情况下为spherical)拟合。经验和理论拟合函数如图10.12所示。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x = np.array([6.8, 4.3, 5.9, 11.6, 5.5, 10.8, 8.6, 12.6, 14.7, 13.9,\n",
    "9.47, 14.3, 8.9, 11.9, 11.75])\n",
    "y = np.array([6.4, 5.0, 6.0, 4.9, 2.7, 8.2, 3.9, 6.7, 10.4, 10.9, 5.6,\n",
    "11.0, 7.3, 6.7, 10.8])\n",
    "z = np.array([10, 11, 12, 9, 12, 8, 10, 8, 6, 6,\n",
    "10, 6, 8, 8, 6])\n",
    "foo = krige.OK(x,y,z)\n",
    "DE, GE = foo.variogram(var_type='averaged', min_lag=0.5)\n",
    "\n",
    "model_par = {'nugget':0, 'range':20, 'sill':6}\n",
    "\n",
    "lags = np.linspace(0,6)\n",
    "G = foo.vario_model(lags, model_par, 'spherical')\n",
    "\n",
    "plt.clf()\n",
    "plt.plot(DE,GE, 'rs', ms=10)\n",
    "plt.plot(lags, G, 'g', lw=3)\n",
    "plt.xlabel('lag distance')\n",
    "plt.ylabel('variogram')\n",
    "plt.ylim(ymin=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，首先我们生成所需的网格，在其上插入数据。这不只是网格格式，它可以在任何位置。但是为了制图的目的，需要网格格式的位置。`OK`类中的`krige`方法用于执行克里金插值。`krige`方法分别为Krigged值和方差生成两个属性`Zg`和`s2_k`。这两个属性都是一维数组，所以为了制图，我们需要将他们的形状转换为二维数组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "Xg, Yg = np.meshgrid(np.linspace(4,16), np.linspace(2,12))\n",
    "foo.krige(Xg, Yg, model_par, 'spherical')\n",
    "krig_z = foo.Zg\n",
    "var_z = foo.s2_k\n",
    "krig_z.shape = 50,50\n",
    "var_z.shape = 50,50\n",
    "\n",
    "plt.clf()\n",
    "plt.pcolor(Xg, Yg, krig_z)\n",
    "plt.colorbar()\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('y')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plt.clf()\n",
    "plt.pcolor(Xg, Yg, var_z)\n",
    "plt.plot(x,y, 'ro', ms=12)\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('y')\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "图10.13显示了Krigged值的地图。图10.14显示了原始站点位置的变化情况。从图中可以很明显看出，站点附近误差的变化很小，而且它随着我们远离站点而增加，当周围都没有站点时，误差便得非常高。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
